Machine learning model repository

ABSTRACT

Embodiments are directed towards a machine learning repository for managing machine learning (ML) model envelopes, ML models, model objects, or the like. Questions and model objects may be received by a ML model answer engine. Machine learning (ML) model envelopes may be received based on the questions. The model objects may be compared to parameter models associated with the ML model envelopes. ML model envelopes may be selected based on the comparison such that the model objects satisfy the parameter models of each of the selected ML model envelopes. ML models included in each selected ML model envelope may be executed to provide score values for the model objects and the score values may be included in a report.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This Utility Patent Application is a Continuation of U.S. patentapplication Ser. No. 15/799,322 filed on Oct. 31, 2017, now U.S. Pat.No. 10,275,710 issued on Apr. 30, 2019, the benefit of which is claimedunder 35 U.S.C. § 120, and the contents of which is further incorporatedin entirety by reference.

TECHNICAL FIELD

The present invention relates generally to machine learning and, moreparticularly, but not exclusively to methods for sharing or distributingmachine learning models.

BACKGROUND

Machine learning is increasingly playing a larger and more importantrole in developing or improving the understanding of complex systems. Asmachine learning techniques have matured, machine learning has rapidlymoved from the theoretical to the practical. Combined with the advent ofbig-data technology, machine learning solutions are being applied to avariety of industries and applications that until now were difficult, ifnot impossible to effectively reason about. As such, there has been anexplosion of the development of different types of machine learningmodels that may be used predicting outcomes for different system. Insome cases, organizations may develop many machine learning models thatmay be directed to different question spaces. Also, organizations may beinterested in borrowing machine learning models, sharing machinelearning models, cooperatively developing machine learning models, orthe like. However, machine learning models may often be developed usingcustom handcrafted designs tailored for individual data sets or for aspecific problems. Accordingly, practical re-use, sharing, or the like,of machine learning models may be difficult and impractical. Thus, it iswith respect to these considerations and others that the invention hasbeen made.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive embodiments of the present innovationsare described with reference to the following drawings. In the drawings,like reference numerals refer to like parts throughout the variousfigures unless otherwise specified. For a better understanding of thedescribed innovations, reference will be made to the following DetailedDescription of Various Embodiments, which is to be read in associationwith the accompanying drawings, wherein:

FIG. 1 illustrates a system environment in which various embodiments maybe implemented;

FIG. 2 shows a schematic embodiment of a client computer;

FIG. 3 illustrates a schematic embodiment of a network computer;

FIG. 4 shows a logical schematic of a portion of a machine learningmodel repository system arranged in accordance with one or more of thevarious embodiments;

FIG. 5 illustrates a logical system for ingesting customer data sets inaccordance with one or more the various embodiments;

FIG. 6 illustrates a logical system for ingesting data sets inaccordance with one or more the various embodiments;

FIG. 7 illustrates a logical system for representing model objects inaccordance with one or more the various embodiments;

FIG. 8A illustrates a first model object being added to a data model inaccordance with one or more the various embodiments;

FIG. 8B illustrates a second model object being added to a data modeland associated with another model object via a relationship edge inaccordance with one or more the various embodiments;

FIG. 9 illustrates logical representation of a machine learning (ML)model envelope for scoring model objects in accordance with one or morethe various embodiments;

FIG. 10 illustrates an overview flowchart for a process for a machinelearning (ML) repository in accordance with one or more of the variousembodiments;

FIG. 11 illustrates a flowchart for a process for data ingestion for amachine learning (ML) repository in accordance with one or more of thevarious embodiments;

FIG. 12 illustrates a flowchart for a process for employing a machinelearning (ML) repository to answer questions in accordance with one ormore of the various embodiments;

FIG. 13 illustrates a flowchart for a process for employing a machinelearning (ML) repository to answer questions in accordance with one ormore of the various embodiments; and

FIG. 14 illustrates a flowchart for a process for employing a machinelearning (ML) repository to answer questions in accordance with one ormore of the various embodiments.

DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS

Various embodiments now will be described more fully hereinafter withreference to the accompanying drawings, which form a part hereof, andwhich show, by way of illustration, specific exemplary embodiments bywhich the invention may be practiced. The embodiments may, however, beembodied in many different forms and should not be construed as limitedto the embodiments set forth herein; rather, these embodiments areprovided so that this disclosure will be thorough and complete, and willfully convey the scope of the embodiments to those skilled in the art.Among other things, the various embodiments may be methods, systems,media or devices. Accordingly, the various embodiments may take the formof an entirely hardware embodiment, an entirely software embodiment oran embodiment combining software and hardware aspects. The followingdetailed description is, therefore, not to be taken in a limiting sense.

Throughout the specification and claims, the following terms take themeanings explicitly associated herein, unless the context clearlydictates otherwise. The phrase “in one embodiment” as used herein doesnot necessarily refer to the same embodiment, though it may.Furthermore, the phrase “in another embodiment” as used herein does notnecessarily refer to a different embodiment, although it may. Thus, asdescribed below, various embodiments may be readily combined, withoutdeparting from the scope or spirit of the invention.

In addition, as used herein, the term “or” is an inclusive “or”operator, and is equivalent to the term “and/or,” unless the contextclearly dictates otherwise. The term “based on” is not exclusive andallows for being based on additional factors not described, unless thecontext clearly dictates otherwise. Also, throughout the specificationand the claims, the use of “when” and “responsive to” do not imply thatassociated resultant actions are required to occur immediately or withina particular time period. Instead they are used herein to indicateactions that may occur or be performed in response to one or moreconditions being met, unless the context clearly dictates otherwise. Inaddition, throughout the specification, the meaning of “a,” “an,” and“the” include plural references. The meaning of “in” includes “in” and“on.”

For example, embodiments, the following terms are also used hereinaccording to the corresponding meaning, unless the context clearlydictates otherwise.

As used herein the term, “engine” refers to logic embodied in hardwareor software instructions, which can be written in a programminglanguage, such as C, C++, Objective-C, COBOL, Java™, PHP, Per1, Python,JavaScript, Ruby, VBScript, Microsoft .NET™ languages such as C#, and/orthe like. An engine may be compiled into executable programs or writtenin interpreted programming languages. Software engines may be callablefrom other engines or from themselves. Engines described herein refer toone or more logical modules that can be merged with other engines orapplications, or can be divided into sub-engines. The engines can bestored in non-transitory computer-readable medium or computer storagedevice and be stored on and executed by one or more general purposecomputers, thus creating a special purpose computer configured toprovide the engine.

As used herein, the terms “raw data set,” or “raw data” refer to datasets provided by an organization that may represent the items to beincluded ingested for use in a machine learning repository. In someembodiments raw data may be provided in various formats. In simplecases, raw data may be provided in spreadsheets, databases, csv files,or the like. In other cases, raw data may be provided using structuredXML files, tabular formats, JSON files, or the like. In one or more ofthe various embodiments, raw data in this context may be the product oneor more preprocessing operations. For example, one or morepre-processing operations may be executed on information, such as, logfiles, data dumps, event logs, database dumps, unstructured data,structured data, or the like, or combination thereof. In some cases, thepre-processing may include data cleansing, filtering, or the like. Theparticular pre-processing operations may be specialized based on thesource, context, format, veracity of the information, or the like.

As used herein, the term “raw data objects” refer to objects thatcomprise raw datasets. For example, if a raw dataset is comprised of aplurality of tabular record sets, the separate tabular record sets maybe considered raw data objects.

As used herein, the term “model object” refers to an object that modelsvarious characteristics of an entity or data object. Model objects mayinclude one or more model object fields that represent features orcharacteristics. Model objects, model object fields, or model objectrelationship may be governed by a model schema.

As used herein, the term “model schema” refers to a schema that definesmodel object types, model object features, model object relationships,or the like, that may be supported by the machine learning repository.For example, raw data objects are transformed into model objects thatconform to a model schema supported by the machine learning repository.

As used herein, the term “data model” refers to a data structure thatrepresents one or more model objects and their relationships. A datamodel will conform to a model schema supported by the machine learningrepository.

As used herein, the term “parameter model” refers to a data structurethat represents one or more model objects and the relationships that aML model envelop or ML model may be arranged to support. A data modelthat includes model objects may be provided to a ML model if the datamodel satisfies the requirements of the ML model's parameter model.

As used herein, the terms “machine learning model” or “ML model” referto a machine learning model that is arranged for scoring or evaluatingmodel objects. The particular type of ML model and the questions it isdesigned to answer will depend on the application for which the ML modelis designed. ML models are associated with parameter models that definemodel objects that the ML model supports.

As used herein, the terms “machine learning model envelope,” or “MLmodel envelope” refer to a data structure that includes one or more MLmodels and a parameter model. A ML model envelope may be arranged toinclude the modules, code, scripts, programs, or the like, forimplementing its one or more included ML models.

The following briefly describes the various embodiments to provide abasic understanding of some aspects of the invention. This briefdescription is not intended as an extensive overview. It is not intendedto identify key or critical elements, or to delineate or otherwisenarrow the scope. Its purpose is merely to present some concepts in asimplified form as a prelude to the more detailed description that ispresented later.

Briefly stated, embodiments are directed towards a machine learningrepository for managing machine learning (ML) model envelopes, MLmodels, model objects, or the like. In one or more of the variousembodiments, ML model answer engine may be instantiated to variousactions as described below. In one or more of the various embodiments,providing query information based on the question and the model schema.And, in some embodiments, executing the query information to provide theone or more model objects.

In one or more of the various embodiments, one or more questions and oneor more model objects may be received by the ML model answer engine suchthat the one or more model objects may be part of a data model thatconforms to a model schema. In one or more of the various embodiments,receiving the one or more model objects may include providing the one ormore raw data model objects in real-time. And, in some embodiments,employing the ingestion engine to transform the one or more raw dataobjects into one or more model objects.

In one or more of the various embodiments, a plurality of machinelearning (ML) model envelopes may be received based on the one or morequestions.

In one or more of the various embodiments, the data model may becompared to parameter models associated with each of the plurality of MLmodel envelopes such that the comparison includes a traversal of thedata model and one or more of the parameter models. In one or more ofthe various embodiments, comparing the data model to the parametermodels may include, employing one or more indices to identify modelobjects that match the parameter models of the selected one or more MLmodel envelopes such that each model object identified using the one ormore indices may be omitted from the one or more traversal paths of thedata model.

In one or more of the various embodiments, one or more of the pluralityof ML model envelopes may be selected based on the comparison such thatthe one or more traversal paths corresponding to the one or more modelobjects satisfy the parameter models of each of the selected one or moreML model envelopes.

In one or more of the various embodiments, one or more ML modelsincluded in each selected ML model envelope may be executed to providescore values for the one or more model objects and the score values maybe included in a report. In one or more of the various embodiments,executing the one or more ML models may include distributing theexecution of the one or more ML models to one or more network computerssuch that two or more ML models that have an affinity to each other maybe distributed to the same network computer.

In one or more of the various embodiments, instantiating a modeltraining engine to perform actions, including: receiving a training dataset comprised of a plurality of model objects such that the plurality ofmodel objects include one or more of model objects provided in acustomer data set, or model objects provided in a validation data set;and training the one or more of the ML models included in the pluralityof ML model envelopes using the training data set.

In one or more of the various embodiments, instantiating an ingestionengine to perform actions, including: receiving one or more raw dataobjects; and executing one or more ingestion rules to transform the oneor more raw data objects into the one or more model objects based on themodel schema.

In one or more of the various embodiments, instantiating an ingestionengine to perform actions, including: adding one or more model objectsto a customer data set; and modifying one or more other data models thatinclude the added model objects based on their relationships with otherpreviously provided model objects.

Illustrated Operating Environment

FIG. 1 shows components of one embodiment of an environment in whichembodiments of the invention may be practiced. Not all the componentsmay be required to practice the invention, and variations in thearrangement and type of the components may be made without departingfrom the spirit or scope of the invention. As shown, system 100 of FIG.1 includes local area networks (LANs)/wide area networks(WANs)—(network) 110, wireless network 108, client computers 102-105,machine learning model repository server computer 116, one or moresource data server computers 118, or the like.

At least one embodiment of client computers 102-105 is described in moredetail below in conjunction with FIG. 2. In one embodiment, at leastsome of client computers 102-105 may operate over one or more wiredand/or wireless networks, such as networks 108, and/or 110. Generally,client computers 102-105 may include virtually any computer capable ofcommunicating over a network to send and receive information, performvarious online activities, offline actions, or the like. In oneembodiment, one or more of client computers 102-105 may be configured tooperate within a business or other entity to perform a variety ofservices for the business or other entity. For example, client computers102-105 may be configured to operate as a web server, firewall, clientapplication, media player, mobile telephone, game console, desktopcomputer, or the like. However, client computers 102-105 are notconstrained to these services and may also be employed, for example, asfor end-user computing in other embodiments. It should be recognizedthat more or less client computers (as shown in FIG. 1) may be includedwithin a system such as described herein, and embodiments are thereforenot constrained by the number or type of client computers employed.

Computers that may operate as client computer 102 may include computersthat typically connect using a wired or wireless communications mediumsuch as personal computers, multiprocessor systems, microprocessor-basedor programmable electronic devices, network PCs, or the like. In someembodiments, client computers 102-105 may include virtually any portablecomputer capable of connecting to another computer and receivinginformation such as, laptop computer 103, mobile computer 104, tabletcomputers 105, or the like. However, portable computers are not solimited and may also include other portable computers such as cellulartelephones, display pagers, radio frequency (RF) devices, infrared (IR)devices, Personal Digital Assistants (PDAs), handheld computers,wearable computers, integrated devices combining one or more of thepreceding computers, or the like. As such, client computers 102-105typically range widely in terms of capabilities and features. Moreover,client computers 102-105 may access various computing applications,including a browser, or other web-based application.

A web-enabled client computer may include a browser application that isconfigured to receive and to send web pages, web-based messages, and thelike. The browser application may be configured to receive and displaygraphics, text, multimedia, and the like, employing virtually anyweb-based language, including a wireless application protocol messages(WAP), and the like. In one embodiment, the browser application isenabled to employ Handheld Device Markup Language (HDML), WirelessMarkup Language (WML), WMLScript, JavaScript, Standard GeneralizedMarkup Language (SGML), HyperText Markup Language (HTML), eXtensibleMarkup Language (XML), JavaScript Object Notation (JSON), or the like,to display and send a message. In one embodiment, a user of the clientcomputer may employ the browser application to perform variousactivities over a network (online). However, another application mayalso be used to perform various online activities.

Client computers 102-105 also may include at least one other clientapplication that is configured to receive and/or send content betweenanother computer. The client application may include a capability tosend and/or receive content, or the like. The client application mayfurther provide information that identifies itself, including a type,capability, name, and the like. In one embodiment, client computers102-105 may uniquely identify themselves through any of a variety ofmechanisms, including an Internet Protocol (IP) address, a phone number,Mobile Identification Number (MIN), an electronic serial number (ESN),universally unique identifiers (UUIDs), or other device identifiers.Such information may be provided in a network packet, or the like, sentbetween other client computers, machine learning model repository servercomputer 116, one or more source data server computers 118, or othercomputers.

Client computers 102-105 may further be configured to include a clientapplication that enables an end-user to log into an end-user accountthat may be managed by another computer, such as machine learning modelrepository server computer 116, one or more source data server computers118, or the like. Such an end-user account, in one non-limiting example,may be configured to enable the end-user to manage one or more onlineactivities, including in one non-limiting example, project management,software development, system administration, data modeling, searchactivities, social networking activities, browse various websites,communicate with other users, or the like. Also, client computers may bearranged to enable users to display reports, interactiveuser-interfaces, and/or results provided by machine learning modelrepository server computer 116.

Wireless network 108 is configured to couple client computers 103-105and its components with network 110. Wireless network 108 may includeany of a variety of wireless sub-networks that may further overlaystand-alone ad-hoc networks, and the like, to provide aninfrastructure-oriented connection for client computers 103-105. Suchsub-networks may include mesh networks, Wireless LAN (WLAN) networks,cellular networks, and the like. In one embodiment, the system mayinclude more than one wireless network.

Wireless network 108 may further include an autonomous system ofterminals, gateways, routers, and the like connected by wireless radiolinks, and the like. These connectors may be configured to move freelyand randomly and organize themselves arbitrarily, such that the topologyof wireless network 108 may change rapidly.

Wireless network 108 may further employ a plurality of accesstechnologies including 2nd (2G), 3rd (3G), 4th (4G) 5th (5G) generationradio access for cellular systems, WLAN, Wireless Router (WR) mesh, andthe like. Access technologies such as 2G, 3G, 4G, 5G, and future accessnetworks may enable wide area coverage for mobile computers, such asclient computers 103-105 with various degrees of mobility. In onenon-limiting example, wireless network 108 may enable a radio connectionthrough a radio network access such as Global System for Mobilcommunication (GSM), General Packet Radio Services (GPRS), Enhanced DataGSM Environment (EDGE), code division multiple access (CDMA), timedivision multiple access (TDMA), Wideband Code Division Multiple Access(WCDMA), High Speed Downlink Packet Access (HSDPA), Long Term Evolution(LTE), and the like. In essence, wireless network 108 may includevirtually any wireless communication mechanism by which information maytravel between client computers 103-105 and another computer, network, acloud-based network, a cloud instance, or the like.

Network 110 is configured to couple network computers with othercomputers, including, machine learning model repository server computer116, one or more source data server computers 118, client computers102-105 through wireless network 108, or the like. Network 110 isenabled to employ any form of computer readable media for communicatinginformation from one electronic device to another. Also, network 110 caninclude the Internet in addition to local area networks (LANs), widearea networks (WANs), direct connections, such as through a universalserial bus (USB) port, other forms of computer-readable media, or anycombination thereof. On an interconnected set of LANs, including thosebased on differing architectures and protocols, a router acts as a linkbetween LANs, enabling messages to be sent from one to another. Inaddition, communication links within LANs typically include twisted wirepair or coaxial cable, while communication links between networks mayutilize analog telephone lines, full or fractional dedicated digitallines including T1, T2, T3, and T4, and/or other carrier mechanismsincluding, for example, E-carriers, Integrated Services Digital Networks(ISDNs), Digital Subscriber Lines (DSLs), wireless links includingsatellite links, or other communications links known to those skilled inthe art. Moreover, communication links may further employ any of avariety of digital signaling technologies, including without limit, forexample, DS-0, DS-1, DS-2, DS-3, DS-4, OC-3, OC-12, OC-48, or the like.Furthermore, remote computers and other related electronic devices couldbe remotely connected to either LANs or WANs via a modem and temporarytelephone link. In one embodiment, network 110 may be configured totransport information of an Internet Protocol (IP).

Additionally, communication media typically embodies computer readableinstructions, data structures, program modules, or other transportmechanism and includes any information non-transitory delivery media ortransitory delivery media. By way of example, communication mediaincludes wired media such as twisted pair, coaxial cable, fiber optics,wave guides, and other wired media and wireless media such as acoustic,RF, infrared, and other wireless media.

One embodiment of machine learning model repository server computer 116is described in more detail below in conjunction with FIG. 3. Briefly,however, machine learning model repository server computer 116 includesvirtually any network computer that is specialized to provide datamodeling services as described herein.

Although FIG. 1 illustrates machine learning model repository servercomputer 116 as a single computer, the innovations and/or embodimentsare not so limited. For example, one or more functions of machinelearning model repository server computer 116, or the like, may bedistributed across one or more distinct network computers. Moreover,machine learning model repository server computer 116 is not limited toa particular configuration such as the one shown in FIG. 1. Thus, in oneembodiment, machine learning model repository server computer 116 may beimplemented using a plurality of network computers. In otherembodiments, server computers may be implemented using a plurality ofnetwork computers in a cluster architecture, a peer-to-peerarchitecture, or the like. Further, in at least one of the variousembodiments, machine learning model repository server computer 116 maybe implemented using one or more cloud instances in one or more cloudnetworks. Accordingly, these innovations and embodiments are not to beconstrued as being limited to a single environment, and otherconfigurations, and architectures are also envisaged.

Illustrative Client Computer

FIG. 2 shows one embodiment of client computer 200 that may include manymore or less components than those shown. Client computer 200 mayrepresent, for example, at least one embodiment of mobile computers orclient computers shown in FIG. 1.

Client computer 200 may include one or more processors, such asprocessor 202 in communication with memory 204 via bus 228. Clientcomputer 200 may also include power supply 230, network interface 232,audio interface 256, display 250, keypad 252, illuminator 254, videointerface 242, input/output interface 238, haptic interface 264, globalpositioning systems (GPS) receiver 258, open air gesture interface 260,temperature interface 262, camera(s) 240, projector 246, pointing deviceinterface 266, processor-readable stationary storage device 234, andprocessor-readable removable storage device 236. Client computer 200 mayoptionally communicate with a base station (not shown), or directly withanother computer. And in one embodiment, although not shown, agyroscope, accelerometer, or the like may be employed within clientcomputer 200 to measuring and/or maintaining an orientation of clientcomputer 200.

Power supply 230 may provide power to client computer 200. Arechargeable or non-rechargeable battery may be used to provide power.The power may also be provided by an external power source, such as anAC adapter or a powered docking cradle that supplements and/or rechargesthe battery.

Network interface 232 includes circuitry for coupling client computer200 to one or more networks, and is constructed for use with one or morecommunication protocols and technologies including, but not limited to,protocols and technologies that implement any portion of the OSI modelfor mobile communication (GSM), CDMA, time division multiple access(TDMA), UDP, TCP/IP, SMS, MMS, GPRS, WAP, UWB, WiMax, SIP/RTP, GPRS,EDGE, WCDMA, LTE, UMTS, OFDM, CDMA2000, EV-DO, HSDPA, or any of avariety of other wireless communication protocols. Network interface 232is sometimes known as a transceiver, transceiving device, or networkinterface card (NIC).

Audio interface 256 may be arranged to produce and receive audio signalssuch as the sound of a human voice. For example, audio interface 256 maybe coupled to a speaker and microphone (not shown) to enabletelecommunication with others and/or generate an audio acknowledgementfor some action. A microphone in audio interface 256 can also be usedfor input to or control of client computer 200, e.g., using voicerecognition, detecting touch based on sound, and the like.

Display 250 may be a liquid crystal display (LCD), gas plasma,electronic ink, electronic paper, light emitting diode (LED), OrganicLED (OLED) or any other type of light reflective or light transmissivedisplay that can be used with a computer. Display 250 may also include atouch interface 244 arranged to receive input from an object such as astylus or a digit from a human hand, and may use resistive, capacitive,surface acoustic wave (SAW), infrared, radar, or other technologies tosense touch and/or gestures.

Projector 246 may be a remote handheld projector or an integratedprojector that is capable of projecting an image on a remote wall or anyother reflective object such as a remote screen.

Video interface 242 may be arranged to capture video images, such as astill photo, a video segment, an infrared video, or the like. Forexample, video interface 242 may be coupled to a digital video camera, aweb-camera, or the like. Video interface 242 may comprise a lens, animage sensor, and other electronics. Image sensors may include acomplementary metal-oxide-semiconductor (CMOS) integrated circuit,charge-coupled device (CCD), or any other integrated circuit for sensinglight.

Keypad 252 may comprise any input device arranged to receive input froma user. For example, keypad 252 may include a push button numeric dial,or a keyboard. Keypad 252 may also include command buttons that areassociated with selecting and sending images.

Illuminator 254 may provide a status indication and/or provide light.Illuminator 254 may remain active for specific periods of time or inresponse to events. For example, when illuminator 254 is active, it maybacklight the buttons on keypad 252 and stay on while the clientcomputer is powered. Also, illuminator 254 may backlight these buttonsin various patterns when particular actions are performed, such asdialing another client computer. Illuminator 254 may also cause lightsources positioned within a transparent or translucent case of theclient computer to illuminate in response to actions.

Further, client computer 200 may also comprise hardware security module(HSM) 268 for providing additional tamper resistant safeguards forgenerating, storing and/or using security/cryptographic information suchas, keys, digital certificates, passwords, passphrases, two-factorauthentication information, or the like. In some embodiments, hardwaresecurity module may be employed to support one or more standard publickey infrastructures (PKI), and may be employed to generate, manage,and/or store keys pairs, or the like. In some embodiments, HSM 268 maybe arranged as a hardware card that may be added to a client computer.

Client computer 200 may also comprise input/output interface 238 forcommunicating with external peripheral devices or other computers suchas other client computers and network computers. The peripheral devicesmay include an audio headset, display screen glasses, remote speakersystem, remote speaker and microphone system, and the like. Input/outputinterface 238 can utilize one or more technologies, such as UniversalSerial Bus (USB), Infrared, WiFi, WiMax, Bluetooth™, Bluetooth LowEnergy, or the like.

Haptic interface 264 may be arranged to provide tactile feedback to auser of the client computer. For example, the haptic interface 264 maybe employed to vibrate client computer 200 in a particular way whenanother user of a computer is calling. Open air gesture interface 260may sense physical gestures of a user of client computer 200, forexample, by using single or stereo video cameras, radar, a gyroscopicsensor inside a computer held or worn by the user, or the like. Camera240 may be used to track physical eye movements of a user of clientcomputer 200.

In at least one of the various embodiments, client computer 200 may alsoinclude sensors 262 for determining geolocation information (e.g., GPS),monitoring electrical power conditions (e.g., voltage sensors, currentsensors, frequency sensors, and so on), monitoring weather (e.g.,thermostats, barometers, anemometers, humidity detectors, precipitationscales, or the like), light monitoring, audio monitoring, motionsensors, or the like. Sensors 262 may be one or more hardware sensorsthat collect and/or measure data that is external to client computer 200

GPS transceiver 258 can determine the physical coordinates of clientcomputer 200 on the surface of the Earth, which typically outputs alocation as latitude and longitude values. GPS transceiver 258 can alsoemploy other geo-positioning mechanisms, including, but not limited to,triangulation, assisted GPS (AGPS), Enhanced Observed Time Difference(E-OTD), Cell Identifier (CI), Service Area Identifier (SAI), EnhancedTiming Advance (ETA), Base Station Subsystem (BSS), or the like, tofurther determine the physical location of client computer 200 on thesurface of the Earth. It is understood that under different conditions,GPS transceiver 258 can determine a physical location for clientcomputer 200. In at least one embodiment, however, client computer 200may, through other components, provide other information that may beemployed to determine a physical location of the client computer,including for example, a Media Access Control (MAC) address, IP address,and the like.

In at least one of the various embodiments, applications, such as,machine learning client application 222, web browser 226, or the like,may be arranged to employ geo-location information to select one or morelocalization features, such as, time zones, languages, currencies,calendar formatting, or the like. Localization features may be used inuser-interfaces, reports, as well as internal processes and/ordatabases. In at least one of the various embodiments, geo-locationinformation used for selecting localization information may be providedby GPS 258. Also, in some embodiments, geolocation information mayinclude information provided using one or more geolocation protocolsover the networks, such as, wireless network 108 and/or network 111.

Human interface components can be peripheral devices that are physicallyseparate from client computer 200, allowing for remote input and/oroutput to client computer 200. For example, information routed asdescribed here through human interface components such as display 250 orkeyboard 252 can instead be routed through network interface 232 toappropriate human interface components located remotely. Examples ofhuman interface peripheral components that may be remote include, butare not limited to, audio devices, pointing devices, keypads, displays,cameras, projectors, and the like. These peripheral components maycommunicate over a Pico Network such as Bluetooth™, Zigbee™, BluetoothLow Energy, or the like. One non-limiting example of a client computerwith such peripheral human interface components is a wearable computer,which might include a remote pico projector along with one or morecameras that remotely communicate with a separately located clientcomputer to sense a user's gestures toward portions of an imageprojected by the pico projector onto a reflected surface such as a wallor the user's hand.

A client computer may include web browser application 226 that may beconfigured to receive and to send web pages, web-based messages,graphics, text, multimedia, and the like. The client computer's browserapplication may employ virtually any programming language, including awireless application protocol messages (WAP), and the like. In at leastone embodiment, the browser application is enabled to employ HandheldDevice Markup Language (HDML), Wireless Markup Language (WML),WMLScript, JavaScript, Standard Generalized Markup Language (SGML),HyperText Markup Language (HTML), eXtensible Markup Language (XML),HTMLS, and the like.

Memory 204 may include RAM, ROM, and/or other types of memory. Memory204 illustrates an example of computer-readable storage media (devices)for storage of information such as computer-readable instructions, datastructures, program modules or other data. Memory 204 may store UnifiedExtensible Firmware Interface (UEFI) 208 for controlling low-leveloperation of client computer 200. The memory may also store operatingsystem 206 for controlling the operation of client computer 200. It willbe appreciated that this component may include a general-purposeoperating system such as a version of UNIX, or LINUX™, or a specializedclient computer communication operating system such as Windows Phone™.The operating system may include, or interface with a Java and/orJavaScript virtual machine modules that enable control of hardwarecomponents and/or operating system operations via Java applicationprograms or JavaScript programs.

Memory 204 may further include one or more data storage 210, which canbe utilized by client computer 200 to store, among other things,applications 220 and/or other data. For example, data storage 210 mayalso be employed to store information that describes variouscapabilities of client computer 200. The information may then beprovided to another device or computer based on any of a variety ofevents, including being sent as part of a header during a communication,sent upon request, or the like. Data storage 210 may also be employed tostore social networking information including address books, buddylists, aliases, user profile information, user credentials, or the like.Data storage 210 may further include program code, data, algorithms, andthe like, for use by a processor, such as processor 202 to execute andperform actions. In one embodiment, at least some of data storage 210might also be stored on another component of client computer 200,including, but not limited to, non-transitory processor-readableremovable storage device 236, processor-readable stationary storagedevice 234, or even external to the client computer.

Applications 220 may include computer executable instructions which,when executed by client computer 200, transmit, receive, and/orotherwise process instructions and data. Applications 220 may include,for example, machine learning client application 222, web browser 226,or the like. In at least one of the various embodiments, machinelearning client application 222 may be used to interact with a machinelearning model repository.

Other examples of application programs include calendars, searchprograms, email client applications, IM applications, SMS applications,Voice Over Internet Protocol (VOIP) applications, contact managers, taskmanagers, transcoders, database programs, word processing programs,security applications, spreadsheet programs, games, search programs, andso forth.

Additionally, in one or more embodiments (not shown in the figures),client computer 200 may include one or more embedded logic hardwaredevices instead of one or more CPUs, such as, an Application SpecificIntegrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs),Programmable Array Logic (PAL), or the like, or combination thereof. Theembedded logic hardware devices may directly execute embedded logic toperform actions. Also, in one or more embodiments (not shown in thefigures), the client computer may include one or more hardwaremicrocontrollers instead of one or more CPUs. In at least oneembodiment, the microcontrollers be system-on-a-chips (SOCs) that maydirectly execute their own embedded logic to perform actions and accesstheir own internal memory and their own external Input and OutputInterfaces (e.g., hardware pins and/or wireless transceivers) to performactions.

Illustrative Network Computer

FIG. 3 shows one embodiment of network computer 300 that may be includedin a system implementing one or more embodiments of the describedinnovations. Network computer 300 may include many more or lesscomponents than those shown in FIG. 3. However, the components shown aresufficient to disclose an illustrative embodiment for practicing theseinnovations. Network computer 300 may represent, for example, oneembodiment of machine learning model repository server computer 116 ofFIG. 1.

As shown in the figure, network computer 300 includes a processor 302 incommunication with a memory 304 via a bus 328. Network computer 300 alsoincludes a power supply 330, network interface 332, audio interface 356,global positioning systems (GPS) receiver 362, display 350, keyboard352, input/output interface 338, processor-readable stationary storagedevice 334, and processor-readable removable storage device 336. Powersupply 330 provides power to network computer 300. In some embodiments,processor 302 may be a multiprocessor system that includes one or moreprocessors each having one or more processing/execution cores.

Network interface 332 includes circuitry for coupling network computer300 to one or more networks, and is constructed for use with one or morecommunication protocols and technologies including, but not limited to,protocols and technologies that implement any portion of the OpenSystems Interconnection model (OSI model), global system for mobilecommunication (GSM), code division multiple access (CDMA), time divisionmultiple access (TDMA), user datagram protocol (UDP), transmissioncontrol protocol/Internet protocol (TCP/IP), Short Message Service(SMS), Multimedia Messaging Service (MMS), general packet radio service(GPRS), WAP, ultra wide band (UWB), IEEE 802.16 WorldwideInteroperability for Microwave Access (WiMax), Session InitiationProtocol/Real-time Transport Protocol (SIP/RTP), or any of a variety ofother wired and wireless communication protocols. Network interface 332is sometimes known as a transceiver, transceiving device, or networkinterface card (NIC). Network computer 300 may optionally communicatewith a base station (not shown), or directly with another computer.

Audio interface 356 is arranged to produce and receive audio signalssuch as the sound of a human voice. For example, audio interface 356 maybe coupled to a speaker and microphone (not shown) to enabletelecommunication with others and/or generate an audio acknowledgementfor some action. A microphone in audio interface 356 can also be usedfor input to or control of network computer 300, for example, usingvoice recognition.

Display 350 may be a liquid crystal display (LCD), gas plasma,electronic ink, light emitting diode (LED), Organic LED (OLED) or anyother type of light reflective or light transmissive display that can beused with a computer. Display 350 may be a handheld projector or picoprojector capable of projecting an image on a wall or other object.

Network computer 300 may also comprise input/output interface 338 forcommunicating with external devices or computers not shown in FIG. 3.Input/output interface 338 can utilize one or more wired or wirelesscommunication technologies, such as USB™, Firewire™, WiFi, WiMax,Thunderbolt™, Infrared, Bluetooth™, Zigbee™, serial port, parallel port,and the like.

GPS transceiver 362 can determine the physical coordinates of networkcomputer 300 on the surface of the Earth, which typically outputs alocation as latitude and longitude values. GPS transceiver 362 can alsoemploy other geo-positioning mechanisms, including, but not limited to,triangulation, assisted GPS (AGPS), Enhanced Observed Time Difference(E-OTD), Cell Identifier (CI), Service Area Identifier (SAI), EnhancedTiming Advance (ETA), Base Station Subsystem (BSS), or the like, tofurther determine the physical location of network computer 300 on thesurface of the Earth. It is understood that under different conditions,GPS transceiver 362 can determine a physical location for networkcomputer 300.

Network computer 300 may also include sensors 364 for determininggeolocation information (e.g., GPS), monitoring electrical powerconditions (e.g., voltage sensors, current sensors, frequency sensors,and so on), monitoring weather (e.g., thermostats, barometers,anemometers, humidity detectors, precipitation scales, or the like),light monitoring, audio monitoring, motion sensors, or the like. Sensors364 may be one or more hardware sensors that collect and/or measure datathat is external to network computer 300

In at least one embodiment, however, network computer 300 may, throughother components, provide other information that may be employed todetermine a physical location of the client computer, including forexample, a Media Access Control (MAC) address, IP address, and the like.

Human interface components can be physically separate from networkcomputer 300, allowing for remote input and/or output to networkcomputer 300. For example, information routed as described here throughhuman interface components such as display 350 or keyboard 352 caninstead be routed through the network interface 332 to appropriate humaninterface components located elsewhere on the network. Human interfacecomponents include any component that allows the computer to take inputfrom, or send output to, a human user of a computer. Accordingly,pointing devices such as mice, styluses, track balls, or the like, maycommunicate through pointing device interface 358 to receive user input.

Memory 304 may include Random Access Memory (RAM), Read-Only Memory(ROM), and/or other types of non-transitory computer readable and/orwriteable media. Memory 304 illustrates an example of computer-readablestorage media (devices) for storage of information such ascomputer-readable instructions, data structures, program modules orother data. Memory 304 stores a unified extensible firmware interface(UEFI) 308 for controlling low-level operation of network computer 300.The memory also stores an operating system 306 for controlling theoperation of network computer 300. It will be appreciated that thiscomponent may include a general-purpose operating system such as aversion of UNIX, or LINUX™, or a specialized operating system such asMicrosoft Corporation's Windows® operating system, or the AppleCorporation's OSX® operating system. The operating system may include,or interface with a Java virtual machine module that enables control ofhardware components and/or operating system operations via Javaapplication programs. Likewise, other runtime environments may beincluded.

Memory 304 may further include one or more data storage 310, which canbe utilized by network computer 300 to store, among other things,applications 320 and/or other data. For example, data storage 310 mayalso be employed to store information that describes variouscapabilities of network computer 300. The information may then beprovided to another device or computer based on any of a variety ofevents, including being sent as part of a header during a communication,sent upon request, or the like. Data storage 410 may also be employed tostore social networking information including address books, buddylists, aliases, user profile information, or the like. Data storage 310may further include program code, data, algorithms, and the like, foruse by one or more processors, such as processor 302 to execute andperform actions such as those actions described below. In oneembodiment, at least some of data storage 310 might also be stored onanother component of network computer 300, including, but not limitedto, non-transitory media inside processor-readable removable storagedevice 336, processor-readable stationary storage device 334, or anyother computer-readable storage device within network computer 300, oreven external to network computer 300. Data storage 310 may include, forexample, machine learning (ML) models 314, ML model envelopes 316,datasets 317 (e.g., customer data set, validation data sets, trainingdata sets, or the like), or the like.

Applications 320 may include computer executable instructions which,when executed by network computer 300, transmit, receive, and/orotherwise process messages (e.g., SMS, Multimedia Messaging Service(MMS), Instant Message (IM), email, and/or other messages), audio,video, and enable telecommunication with another user of another mobilecomputer. Other examples of application programs include calendars,search programs, email client applications, IM applications, SMSapplications, Voice Over Internet Protocol (VOIP) applications, contactmanagers, task managers, transcoders, database programs, word processingprograms, security applications, spreadsheet programs, games, searchprograms, and so forth. Applications 320 may include machine learning(ML) engine 322, ingestion engine 324, ML model training engine 326, MLmodel answer engine 329, other applications 331, or the like, that mayperform actions further described below. In at least one of the variousembodiments, one or more of the applications may be implemented asmodules and/or components of another application. Further, in at leastone of the various embodiments, applications may be implemented asoperating system extensions, modules, plugins, or the like.

In at least one of the various embodiments, applications, such as,machine learning (ML) engine 322, ingestion engine 324, ML modeltraining engine 326, ML model answer engine 32, other applications 331,or the like, may be arranged to employ geo-location information toselect one or more localization features, such as, time zones,languages, currencies, calendar formatting, or the like. Localizationfeatures may be used in user-interfaces, reports, as well as internalprocesses and/or databases. In at least one of the various embodiments,geo-location information used for selecting localization information maybe provided by GPS 362. Also, in some embodiments, geolocationinformation may include information provided using one or moregeolocation protocols over the networks, such as, wireless network 108and/or network 110.

Furthermore, in at least one of the various embodiments, machinelearning (ML) engine 322, ingestion engine 324, ML model training engine326, ML model answer engine 329, other applications 331, may beoperative in a cloud-based computing environment. In at least one of thevarious embodiments, these engines, and others, that comprise themachine learning model repository that may be executing within virtualmachines and/or virtual servers that may be managed in a cloud-basedbased computing environment. In at least one of the various embodiments,in this context applications including the engines may flow from onephysical network computer within the cloud-based environment to anotherdepending on performance and scaling considerations automaticallymanaged by the cloud computing environment. Likewise, in at least one ofthe various embodiments, virtual machines and/or virtual serversdedicated to machine learning (ML) engine 322, ingestion engine 324, MLmodel training engine 326, ML model answer engine 329, otherapplications 331, may be provisioned and de-commissioned automatically.

Further, in some embodiments, network computer 300 may also includehardware security module (HSM) 360 for providing additional tamperresistant safeguards for generating, storing and/or usingsecurity/cryptographic information such as, keys, digital certificates,passwords, passphrases, two-factor authentication information, or thelike. In some embodiments, hardware security module may be employ tosupport one or more standard public key infrastructures (PKI), and maybe employed to generate, manage, and/or store keys pairs, or the like.In some embodiments, HSM 360 may be arranged as a hardware card that maybe installed in a network computer.

Additionally, in one or more embodiments (not shown in the figures),network computer 300 may include an one or more embedded logic hardwaredevices instead of one or more CPUs, such as, Application SpecificIntegrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs),Programmable Array Logic (PALs), or the like, or combination thereof.The one or more embedded logic hardware devices may directly execute itsembedded logic to perform actions. Also, in one or more embodiments (notshown in the figures), the network computer may include one or morehardware microcontrollers instead of one or more CPUs. In at least oneembodiment, the one or more microcontrollers may directly executeembedded logic to perform actions and access their own internal memoryand their own external Input and Output Interfaces (e.g., hardware pinsand/or wireless transceivers) to perform actions. E.g., they may bearranged as Systems On Chips (SOCs).

Illustrative Logical System Architecture

FIG. 4 shows a logical schematic of a portion of machine learning modelrepository system 400 arranged in accordance with one or more of thevarious embodiments. In one or more of the various embodiments, system400 represents logical interaction between or among machine learning(ML) engine 322, ingestion engine 324, ML model training engine 326, MLmodel answer engine 329, or the like, that may be hosted by one or morenetwork computers, such as, as network computer 300.

In one or more of the various embodiments, system 400 may be arranged toinclude one or more raw data sets, such as, raw data 402 comprising acustomer data from various sources. In this example, raw data 402represents customer data that has been previously collected by acustomer. In some cases, raw data 402 may include historical datacollected from various sources.

In one or more of the various embodiments, raw data 402 may be providedto an ingestion engine, such as, ingestion engine 404. In one or more ofthe various embodiments, ingestion engine 404 may be arranged to executeone or more data ingestion processes to pre-processors some or all ofraw data 404 to provide objects that are compatible with the ML modelrepository, For example, ingestion engine 404 may be arranged to conformraw data to a well-defined schema used by the ML model repositoryplatform.

In one or more of the various embodiments, customer dataset 406represents customer objects that conform to an defined schema (e.g.,model schema). In one or more of the various embodiments, customer dataset 406 may be one or more data stores, such as, databases, cloud-basedstorage, file stores, or the like. In some embodiments, some or all ofcustomer data set 406 may be distributed across different locations.Likewise, in some embodiments, the contents of data set 406 may besegregated into various priority tiers such as, local caches,distributed caches, normal storage, archival storage, or the like. Inone or more of the various embodiments, the data segregation may occurduring ingestion based on the execution of one or more ingestion rules,or it may occur later sometime after the data is initially ingested.

In one or more of the various embodiments, machine learning (ML) modeldesign and training engine 408 may be arranged to employ customer dataset 406 to enable customer/users to design, implements, train, or testvarious machine learning models. ML model design and training engine mayinclude one or more user interfaces that enable users, such as, datascientists, to design, validate, or train ML models. In one or more ofthe various embodiments, if a ML model is trained it may be added to aML model repository data store, such as ML model repository 412.Accordingly, in some embodiments, it may then be made available toclassify or otherwise provide answers based on information provided bythe customer or other applications.

Also, in one or more of the various embodiments, ML Models may bedesigned or trained based on one or more validation data sets that maybe made available by the ML modeling platform. For example, in someembodiments, validation dataset 410 may include various data sets thatare arranged to conform to one or more model schemas supported by the MLmodeling platform. Accordingly, in one or more of the variousembodiments, customers may be enabled to design and train ML modelsusing the provided validation datasets rather than having to providetheir own data for training their ML Models.

In one or more of the various embodiments, ML models stored in ML modelrepository 412 may be made available to one or more users or customersfor use with their own data inputs. In some embodiments, the customersor users may be enabled to employ one or more ML models that may havebeen provided by other users and shared or otherwise made available.

In one or more of the various embodiments, raw data 414 representsproduction data provided by a customer. In one or more of the variousembodiments, raw data 414 may be similar to raw data 402. In some cases,raw data 402 and raw data 414 may contain the same information oroverlapping information. In some embodiments raw data 414 may be otherdata collected separately or subsequently from raw data 402. In someembodiments, raw data 414 may be produced or collected according one ormore well-defined schedules. Likewise, raw data 414 may includestreaming data or real-time data.

In one or more of the various embodiments, production raw data, such as,raw data 411 still may require ingestion. Accordingly, in someembodiments, an ingestion engine, such as, ingestion engine 416 may bearranged to convert raw data 414 to model objects that conform to theappropriate model schema. Next, in one or more of the variousembodiments, those model objects may be provided to a ML model answerengine, such as, ML model answer engine 418 that may be arranged toprovide model object to one or more ML models from repository 412.Accordingly, in one or more of the various embodiments, ML model answerengine 418 may be arranged to provide one or more answer reports, suchas, answer report 420 that include one or more answers provided by MLmodel answer engine 418.

FIG. 5 illustrates logical system 500 for ingesting customer data setsin accordance with one or more the various embodiments. In at least oneof the various embodiments, system 500 may include: an ingestion engine,such as, ingestion engine 502; one or more raw data sets, such as, rawdata set 504; ingestion rules, such as, ingestion rules 506; or thelike. In at least one of the various embodiments, ingestion engine 502may employ raw data set 504, and ingestion rules 506 to provide modelobjects 508.

In at least one of the various embodiments, model objects 508 mayrepresent objects that have been coerced into objects the conform with amodeling schema supported by the machine learning modeling platform.

In at least one of the various embodiments, model objects may benormalized such that anomalies, such as, spelling errors, divergentabbreviations and/or short hand notations, different valuesrepresentation the same concept, may be “normalized” to commonvalues/representations defined by the modeling schema.

In some embodiments, ingestion rules 506 may include rules andinformation for mapping various ad hoc/colloquial/casual names that maybe used for the raw data values to a common names or data typessupported by the modeling schema. For example, depending on the sourceof the raw data, different names or values may be entered into the datafor same modeling schema field. For example, in some cases, the raw datavalues may be represented using abbreviations, in other cases, the rawdata may be spelled using different/various capitalization, spelling,and so on. Accordingly, in at least one of the various embodiments,ingestion engine 502 may be arranged to execute one or more ingestionrules to map from one or more values in the raw datasets to a value thatmay be consistent with the modeling schema.

Likewise, in one or more of the various embodiments, ingestion engine502 may be arranged to map one or more properties in the raw datasets toparticular fields of particular model objects based on the modelingschema.

In one or more of the various embodiments, raw data sets, such as, rawdata sets 504 may be provided by one or more source data servercomputers, such as, source data server computer 118. Accordingly, insome embodiments, source data servers may communicate notifications overa network to indicate that raw data is available for ingestion. In somecases, if network communication with the between the machine learningmodel repository servers and source data servers may be inactive orotherwise disabled, the raw data may be cached at the source dataserver. Thus, if the communication between the source data servers andthe machine learning model repository servers is enabled again, thesource data servers may communicate a notification indicating that thenetwork communication (e.g., a network connection) has beenre-established. Also, when the network connection is re-established, theingestion engine may be arranged to obtain any cached or remaining rawdata records from one or more source data servers.

Also, in at least one of the various embodiments, a notification may beprovided from another source, including a user. Such notifications mayindicate that one or more raw data sets have changed and/or one or moreraw data sets are available for ingestion. Accordingly, in at least oneof the various embodiments, when the ingestion engine establishesnetwork communication with the one or more source data servers it may,in acknowledgement of the notifications, obtain the updated raw data foringestion.

In at least one of the various embodiments, ingestion rules may includeone or more sets of instructions/conditions for transforming raw datarecords into model data records. In at least one of the variousembodiments, the ingestion rules may be arranged to normalize valuesincludes in raw data records to the values that comprise the model datarecords. Normalizing in this context can mean to map/transform variousinput values to common values, and the like, rather than being limitedto arithmetical normalization.

In at least one of the various embodiments, ingestion rules 506 may bearranged to compare raw data values with values in the allowed orexpected values based on the modeling schema. In at least one of thevarious embodiments, one or more rules may include patterning matchinginstructions that may compare values in the raw data records withexpected value.

Also, in at least one of the various embodiments, ingestion rules 506may be arranged to scan more than one field in the raw records todetermine a correct mapping/transformation for model objects. In someembodiments, there may be information scattered across differentfields/columns of the raw data records that may be viewed as a whole tomake a determination for a mapping/transformation to a model object.

In at least one of the various embodiments, one or more of the ingestionrules may be arranged based on prior knowledge of the structure and/orfields of the raw datasets. Accordingly, one or more fields of the rawdatasets records may be used for mapping directly to a field of themodel object. For example, in at least one of the various embodiments,if raw data records may include a field labeled ‘LastName’ that is knownto include the last name of an employee, the ingestion rule may bearranged to directly map the LastName fields to the last name field ofthe employee model object.

Continuing with the same non-limiting example, in other embodiments, inthe absence of prior knowledge, the ingestion rules may be arranged toscan raw data records to look for contents that appear to be anemployee's last name and then map those values into a model object'slast name field.

In one or more of the various embodiments, raw datasets 504 may beprovided in bulk as part of an initial on-boarding of customer. Forexample, initial preparation for a new customer may include ingestingcustomer raw datasets into a ML modeling repository. Likewise, in someembodiments, raw datasets (e.g., raw data) may be provided periodicallyor real-time as it is collected by the customer. For example, in someembodiments, if the customer is a health clinic, raw data such aspatient intake information may be provided as patients are seen by theclinic. In some embodiments, such data may be collected and providedperiodically (e.g., every hour, daily, or the like) or it may beprovided in real-time as patients as encountered.

In one or more of the various embodiments, raw datasets 504 may bedatabases or other data stores maintained by the customer. In otherembodiments, raw datasets 504 may be provided via direct or indirectintegration with other third party systems of services, such as, forexample, patient intake applications, hospital management applications,insurance companies, or the like. Further, in some embodiments, one ormore users may be enabled to directly provide (e.g., data entry) data toan ingestion engine.

FIG. 6 illustrates logical system 600 for ingesting data sets inaccordance with one or more the various embodiments. In this example,system 600 represent portions of a raw data set and portions of modelobjects produced therefrom by an ingestion engine.

Accordingly, in at least one of the various embodiments, system 600 mayinclude: patient raw data 602, appointment raw data 612, patient modelobject 626, and encounter object 644.

In this example, patient raw data 602, represent a raw datarepresentation of a patient of a healthcare enterprise. In this example,for some embodiments, patient raw data 602 includes, row ID column 604,patient number column 606, name column 608, job column 610, amongothers. Also, in this example, appointment raw data 612 may be arrangedto include, various columns, such as, patient ID 614, treatment code616, clinic 618, duration 622, staff 624, or the like.

In one or more of the various embodiments, the raw data objects (e.g.,patient raw data 602 and appointment raw data 612) may be provided to aningestion engine. Accordingly, in some embodiments, the ingestion enginemay be arranged to provide one or more model objects, such as, patientobject 626 and encounter object 644, based on the raw data objects andone or more ingestion rules.

In one or more of the various embodiments, the ingestion engine may bearranged to execute one or more ingestion rules that transform raw datainto model objects that conform to one or more defined model schemas.

In this example, patient object 626 is represents using fields 628 andfield values 630. In some embodiments, patient objects, such as patientobject 626 may include fields, such as ID field 632, first name field634, last name field 636, age 638, gender 640, employment 642, or thelike. Likewise, in this example, encounter object 644 represents apatient's encounter with a healthcare enterprise. Accordingly, for thisexample, it include encounter object fields 648 and encounter objectfield values 650. Further, in this example, encounter object 644 may bearranged to include fields, such as, patient ID field 652, timestampfield 654, code field 656, location field 658, staff field 660, durationfield 662, or the like.

In some embodiments, the ingestion engine may be arranged take raw dataobjects and produce models objects. In this example, patient data 602 isarranged to include the patient's first name and last name in one field(e.g., column 608). However, the patient model object require first nameand last name to be separate fields. Accordingly, in this example, forsome embodiments, the ingestion engine may be arranged to execute aningestion rule that parse the name column values of patient raw dataobjects to extract the first name and last name so they may be valuesfor the first name field and last name field of patient model objects.In one or more of the various embodiments, additional transformationsmay be required, such as, converting raw data birthdate values includeage field values, and so on.

FIG. 7 illustrates logical system 700 for representing data models thatinclude model objects in accordance with one or more the variousembodiments. In this example, system 700 represent portions data modelsthat include model objects created from ingested raw data sets. In oneor more of the various embodiments, the ML model repository may bearranged to include one or more model schemas that may be represented asone or more graphs. Accordingly, in one or more of the variousembodiments, model objects may be included in data models represented asgraphs where populated field values in the model objects may be thenodes (e.g., vertices) and the relationship between the field and themodel object or other model object are represented by the edges of thegraph.

In one or more of the various embodiments, each model object and itsdata model representation may be restricted or constrained such thatthey conform to the model schema. In one or more of the variousembodiments, model objects may be arranged such that they may includefewer fields than the model schema allows. Accordingly, in someembodiments, model objects may be arranged to include as muchinformation that may be available for a given object.

In one or more of the various embodiments, some instances of modelobjects may be missing information while other instances have fieldsthat represent information missing for other model objects. In one ormore of the various embodiments, the variation between differentinstances of model objects may be related to various factors, such as,raw data variability, raw data errors, sampling or data collection timedifferences, or the like.

As discussed above, in one or more of the various embodiments, duringingestion, the ingestion engine may transform fields from raw dataobjects into model object fields that conform with the model schema.However, in one or more of the various embodiments, if the fieldinformation is unavailable to the ingestion engine, it may produce amodel object that may be based on the available information. Forexample, if patient raw data omits a patient's first name, the patientmodel object may be created and added to the customer dataset absent afirst name field. However, in one or more of the various embodiments, ifmissing field data is provided later, the appropriate model objectfields may be added to the model object and their including data modelmay be modified accordingly. Note, in some embodiments, information maybe missing by design or on account of errors.

For example, in one or more of the various embodiments, raw data patientobject 602 includes three patients, two of those raw data patientsinclude an entry for Job and one does not. Accordingly, in this example,patient model objects corresponding to those three patient raw dataobjects may be processed such that two patient model objects will havetheir employment field populated while the one patient model object thatcorresponds to the raw data patient missing the Job fields may becreated absent a value of employment.

Thus, in one or more of the various embodiments, some definitions in themodel schema may be permissive in the sense that some missing fields maybe allowed. However, in some embodiments, the model schema may bearranged to prevent unknown or new fields being added to a model object.In some embodiments, the ingestion engine may be arranged to apply oneor more ingestion rules to provide default or placeholder values for agiven model object field.

In one or more of the various embodiments, as instances of model objectsare provided by an ingestion engine, the ingestion engine may bearranged to generate a data model that has graph corresponding to themodel objects.

In this example, data model 702 represents patient model object 704 andencounter object 712. In some embodiments, patient model object 704 mayinclude one or more fields such as first name field 706, last name field708, and age field 710.

Similarly, in this example, data model 720 represents another patientmodel object instance, patient model object 722. In this example,patient model object 722 represents an instance that includes first namefield 706, last name field 708, age field 728, and employment field 730.

Note, in this example, there are two patient model objects that (areassumed to) conform to the same model schema. However, in this example,for one or more of the various embodiments, since patient model object702 does not have a value for employment field, that the employmentfield may be omitted from its data model. In some embodiments, theunderlying data structure used to implement model objects or data modelsmay be arranged to maintain a placeholder for all the potential fieldsfor a given model object type. Alternatively, in some embodiments, thefields may be added as needed if they are allowed by the model schema.

In one or more of the various embodiments, model schemas may also defineone or more relationships between different model objects. In thisexample, patient object 704 is related to encounter object 712(indicated here by an edge illustrated with a dotted line) and patientobject 722 is related to 732. In this example, the relationshiprepresents that an encounter object represents an encounter with aparticular patient. Accordingly, in this example, encounter object 712,includes timestamp field 714, code field 716, and location field 718.Likewise, in this example, encounter object 732 includes timestamp field734, code field 736, location field 738, and surgeon field 740.

In this example, the flexibility of the model schema is illustrated byencounter object 732 including surgeon field 740 that represents asurgeon that was involved with the encounter. In contrast, in thisexample, encounter object 712 does not include a surgeon field. In thisexample, encounter object 712 does not include surgeon field because itsassociated raw data object did not include this information.

Note, in one or more of the various embodiments, model object generationdepends on the interplay of the raw data objects and the ingestionrules. In this example, ingestion rules may be arranged to interpretvalues for code field 736 based on the Treatment Code column 616 in rawdata appointment object 612. In this example, the appointment associatedwith patient ID 9081 (the second record from the top of appointmentobject 612) includes a treatment code that corresponds to a surgery andit also includes a staff member listed as Mike Black (column 624). Thus,for example, the ingestion engine may be arranged to assume the staffmember listed for a surgery is a surgeon. Also, in some embodiments,ingestion engines may be arranged to perform various validationoperations to confirm inferences. For example, in this example, theingestion engine may be arranged to confirm that “Mike Black” is listedin a table of known surgeons that are associated with a the treatmentcode/code for a given appointment.

As discussed above, in one or more of the various embodiments, raw dataused to provide model objects may come from different sources atdifferent times. For example, in some embodiments, patient raw data maybe collected every hour while appointment raw data may be collectedevery four hours. Thus, in this example, one or more patient modelobjects may be provided absent one or more relevant encounter objects(e.g., model objects representing patient appointments) because the rawdata for patients may be provided before the raw data for encounters isavailable.

As relevant model objects become available via ingestion they may beadded to the data model the includes the model objects. In someembodiments, in this example, edges that represent relationships betweendifferent model objects will be added to the data models as the modelobjects become available. In this example, the edges between patientmodel objects, such as patient model object 704, and encounter modelobjects, such as encounter model object 712 are represented using thedashed line because it may be absent if the patient model object or theencounter model object have not been ingested.

FIGS. 8A and 8B illustrate logical system 800 for representing modelobjects and data models in accordance with one or more the variousembodiments. These figures illustrate how a data model that includemodel objects may be incrementally extended or created as the ingestionengine ingests various raw data objects to add model objects or modelobject fields to a system.

FIG. 8A illustrates a first model object (e.g., model object 802) isadded to a data model in accordance with one or more the variousembodiments. For example, this could be a patient model object.Initially it is added to the data model. Note, any given model objectmay be comprised fields that may be of various data types, including,single values, lists, sets, ranges, other model objects, or the like.

FIG. 8B illustrates how an second model object, model object 804 isadded to a data model associated with model object 802 via relationshipedge 806 in accordance with one or more the various embodiments. In oneor more of the various embodiments, the ingestion engine may be arrangedto recognize relationships based on the model schema. For example, theingestion engine may be arranged to map values from raw data object intoparticular fields of model objects. Thus, in some embodiments, the modelschema may be arranged to define allowed relationships between modelobjects based on defined key field values. For examples, the modelschema may be arranged to define a relationship between patient objectsand encounter objects based on them sharing the same value for patientID, or the like. Thus, in some embodiments, as model objects are addedto a data model the relationships that conform to the model schema maybe included.

FIG. 9 illustrates logical representation of machine learning (ML) modelenvelope 900 for scoring model objects in accordance with one or morethe various embodiments. In one or more of the various embodiments, MLmodel envelopes may be comprised of parameter model 902, ML model 904,model output 906, or the like.

In one or more of the various embodiments, a parameter model, such asparameter model 902 may be defined in terms of the model schema. In someembodiments, parameter model 902 may act as a guard that restricts whichmodel objects may be provided to a ML model or ML model envelope. In oneor more of the various embodiments, parameter model 902 may identify oneor more portions of a data model that may be provided to a particular MLmodel, such as ML model 904.

In one or more of the various embodiments, parameter models define themodel objects that are applicable or eligible for scoring with a givenML model or ML model envelope. For example, in one or more of thevarious embodiments, parameter model 902 may be arranged to requiremodel objects that conform to a data model portion such as data model702. While, in some embodiments, different parameter models may bearranged to accept different model objects, such as data model 720.Note, in one or more of the various embodiments, a ML model that canconsume data model 702 may also be able to consume data model 720 sincedata model 720 includes all of the required model objects and fields tomeet such a requirement. However, a ML model that requires data model720 would exclude data model 702 since data model 702 is missing one ormore required model objects or model object fields, such as, in thisexample, where a surgeon field in the encounter model object andemployment field in the patient model object may be required by aparameter model.

In one or more of the various embodiments, ML model 904 represents theactual machine learning model included in ML model envelope 900 that maybe executed by the machine learning engine. In some embodiments, amachine learning repository or ML model answer engine may be arranged toprovide one or more model objects that match a ML model envelope'sparameter model. Accordingly, the ML model may accept the matching modelobjects that satisfy the parameter model and produce a score based onthe provided model objects.

In one or more of the various embodiments, the particular ML model orits underlying model implementation may be arbitrary as long as itaccepts the model objects that satisfy its associated parameter model.For example, a simple ML model may be arranged to provide scores thatindicates if a patient is old or young. Accordingly, in this example,the ML model may include parameter model that requires a patient modelobject that includes an age value. Thus, in this trivial example, the MLmodel may be arranged to produce a true result if the age value is abovea defined threshold. In contrast, in some embodiments, ML models may bearranged to be a complex artificial neural network that is trained toconsume data models that include several complex model objects have manymodel object fields.

In one or more of the various embodiments, parameter model 902 may beused by a ML model answer engine, such as ML model answer engine 329, toquery the customer dataset for model objects that may be eligible inputsto a given ML model or ML model envelopes. For example, if the parametermodel requires patients that have one or more encounters with healthfacility, model objects conforming to data model 702 or data model 720would be eligible. However, in some embodiments, if the parameter modelfor a ML model require patient model objects that include employmentinformation and encounter model objects that are associated withsurgeons and surgery, model objects conforming to data model 720 wouldbe eligible for scoring.

In one or more of the various embodiments, ML models may be comprised oftwo or more other ML models. For example, ML model envelope 908 includesML model 912 that is comprised of two or more ML models (e.g., ML models916). Accordingly, in one or more of the various embodiments, parametermodel 910 may be arranged to accept model objects conforming to datamodels that are required or compatible with the included ML models 916.Likewise, in one or more of the various embodiments, model object 914may be arranged to produce output values based on a combination ofsub-outputs produced by ML model 916. Note, the particular combinationof the sub-outputs may be included as part of ML model 912 or modeloutput 914 based on the application of ML model 912. In someembodiments, ML models that include more than one ML model may bearranged include rules that select one or more sub-outputs to include orcombine into its ultimate output. For example, in some embodiments, MLmodel 912 may be arranged to exclude one or more outlying results andthen provide a score that is based on an average of the remainingresults. Likewise, in some embodiments, rules may employ dynamicprogramming such that one or more of the included ML models are useddepending on the input parameter values.

Also, in one or more of the various embodiments, one or more ML modelsmay be arranged in series or in parallel. In some embodiments, one ormore ML models may be arranged to score model objects before they areprovided to other ML models for additional scoring. In some embodiments,one or more ML models may be employed as a filter to select modelobjects to pass to other ML models for further scoring. For example, afirst ML model may be used to score objects such that model objectsassociated with a score the is outside a defined range of values may beexcluded from consideration of the one or more other included ML models.

In one or more of the various embodiments, the specific application ofthe included ML models, such as ML models 916 may be determined duringthe model training phase. For example, if the included ML models areemployed as portions of an artificial neural network that is trained asa whole, the details of learned models, such as, connection weights,cost function parameters, or the like, may be determined based on modeltraining rather than intentional design.

Generalized Operations

FIGS. 10-14 represent the generalized operations for a machine learningmodel repository in accordance with at least one of the variousembodiments. In one or more of the various embodiments, processes 1000,1100, 1200, 1300, and 1400 described in conjunction with FIGS. 10-14 maybe implemented by and/or executed on a single network computer, such asnetwork computer 300 of FIG. 3. In other embodiments, these processes orportions thereof may be implemented by and/or executed on a plurality ofnetwork computers, such as network computer 300 of FIG. 3. However,embodiments are not so limited, and various combinations of networkcomputers, client computers, virtual machines, or the like may beutilized. Further, one or more of the various embodiments, the processesdescribed in conjunction with FIGS. 10-14 may be operative in a machinelearning model repository such as described in conjunction with FIGS.4-9.

FIG. 10 illustrates an overview flowchart for process 1000 for a machinelearning (ML) repository in accordance with one or more of the variousembodiments. After a start block, at block 1002, in one or more of thevarious embodiments, optionally, a ML learning repository may bearranged ingest training data. As described above, customer may beenabled to provide training data based on their own raw data sets.Accordingly, in one or more of the various embodiments, the ingestingengine may ingest the raw data and transform it as necessary to conformit two one or more model schemas that may be supported or enforced bythe ML model repository.

This block is indicated as being optional, because in some embodiments,a customer may be enabled to use training data sets that are alreadyavailable in the ML model repository. For example, the repository may beloaded with validation data sets that include model objects sufficientfor training ML models. In some cases, the model objects may be havebeen provided based on a customer's raw data that was ingested earlier.In other cases, for some embodiments, the model objects used by acustomer for training may be provided by third-party data providers.

At block 1004, in one or more of the various embodiments, optionally,the customer (e.g., a user of the ML model repository) may be enabled todesign, train, or deploy one or more ML models and include them in oneor more ML model envelopes. As described above, a user may design one ormore ML models based on the available model objects or model schema.

In one or more of the various embodiments, designing ML models mayinclude providing one or more queries, evaluators, expression, rules,conditions, seed values, or the like, that will comprise the ML model.In one or more of the various embodiments, one or more portions of theML models may be designed using programs, scripts, data structures, orthe like, using one or more computer programing languages, such as, R,Matlab, SAS, Per1, C++, Java, or the like, or combination thereof.Further, in some embodiments, custom programming languages may be usedas well as visual programming tools or environments.

This block is indicated as being optional, because in some embodiments,a customer may be enabled to use ML models that are already available inthe ML model repository. For example, the repository may be loaded withML models sufficient for answering questions that the customer intendsto propose. In some cases, the ML models may have been designed,trained, or deployed by the customer. In other cases, for someembodiments, the ML models used by a customer may be provided bythird-party ML model providers. Accordingly, in some embodiments, acustomer may be enabled to design ML model envelopes that includepreviously designed or trained ML models.

At block 1006, in one or more of the various embodiments, the ML modelrepository may be arranged to provide the raw input to one or more MLmodels. In one or more of the various embodiments, the ML modelrepository may be arranged to accept raw data for scoring from varioussources depending on the application. In some embodiments, the raw datamay be real-time data or signal feeds from other applications. In otherembodiments, the raw data may be loaded periodically from one or moredata stores, such as, file dumps, log files, databases, message queues,or the like.

In one or more of the various embodiments, the ML model repository maybe arranged enable data from various sources to be provided. In someembodiments, the ML model repository may be arranged to have one or moreingestion interfaces that may be used to enable customers or otherthird-parties to provide raw data objects. For example, customers may beenabled to employ REST APIs to provide input raw data. In otherembodiments, the ML model repository (e.g., the ingestion engine) may bearranged to monitor one or more data stores (e.g., file repositories,cloud-storage data repositories, or the like) and input raw data that acustomer may place within.

At block 1008, in one or more of the various embodiments, the ML modelrepository may be arranged to transform the input raw data objects intomodel objects that conform with one or more model schemas. As discussedabove, ML models included in the ML model repository are arranged torequire model objects that conform to a particular model schema.Accordingly, in one or more of the various embodiments, an ingestionengine may be arranged to perform the necessary data processing ortransformations.

At block 1010, in one or more of the various embodiments, the ML modelrepository may be arranged to select one or more ML models that may beapplicable to the model objects produced from the raw data. In one ormore of the various embodiments, the ML model repository may identify orselect the one or more ML models based on a comparison of the providedmodel objects and the parameter models associated with each ML model. Insome embodiments, ML models that are associated with a parameter modelthat matches the provided model objects may be selected to score themodel objects.

At block 1012, in one or more of the various embodiments, the ML modelrepository may be arranged to score the model objects using one or moremachine learning models. As described above, one or more ML modelsincluded in a ML model envelope that may be arranged to produce scores(e.g., answers or other results) based on evaluating model objects thatmay comprise portions of data models that conform to one or more modelschemas. That particular score or result depends on the design of the MLmodels and the model objects that it scores. Next, control may bereturned to a calling process.

FIG. 11 illustrates a flowchart for process 1100 for data ingestion fora machine learning (ML) repository in accordance with one or more of thevarious embodiments. After a start block, at block 1102, in one or moreof the various embodiments, raw data or raw data objects may be providedto an ingestion engine. As described above, the raw data may be providedin various formats or from various sources.

At block 1104, in one or more of the various embodiments, the ingestionengine may be arranged to map raw data fields into one or more modelobject fields. In one or more of the various embodiments, one or moreraw data fields may be mapped directly to one or more model objectfields. In one or more of the various embodiments, one or more ingestionrules may be executed by the ingestion engine to perform the mappingfrom raw data object fields to model object fields.

At block 1106, in one or more of the various embodiments, the ingestionengine may be arranged to compose or decompose one or more raw datafields into one or more model object fields. For example, if a singleraw data field includes data for two model object fields, the ingestionengine may be arranged to execute one or more ingestion rules that parsethe single raw data field to identify the values that should be mappedto the each of the two model object fields. Likewise, for example, iftwo or more raw data fields include values that together represent asingle model object fields, the ingestion engine may be arranged toproduce a value for the single model object field based on the two ormore raw data object fields.

At block 1108, in one or more of the various embodiments, the ingestionengine may be arranged to normalize one or more raw data fields intomodel object field values. In one or more of the various embodiments,the ingestion engine may be arranged to execute one or morenormalization operations to conform the ingested data to the modelschema. For example, the model schema may constrain one or more modelobject fields to be represented as values from 0.0 to 1.0.

In other embodiments, a model schema may constrain model object fieldvalues in a various ways, such as, value ceilings, value floors,significant figures, rounding, fixed precision, units, set membership,sign/absolute values, data types (e.g., integers, floating points,strings), data structures (e.g., vectors, sets, sequences, multi-valuedstructures, or the like), value ranges, or the like, or combinationthereof. Accordingly, in one or more of the various embodiments, thisnormalization block is not limited to classical arithmetic normalizationoperations, rather it is a normalization of data based on the executionof one or more ingestion rules by the ingestion engine to conform theraw data field values to the definitions or constraints required by thetargeted model schema.

At block 1110, in one or more of the various embodiments, the ingestionengine may be arranged to store the model objects and relate them thedata model. In one or more of the various embodiments, the ingestionengine may be arranged to traverse the model schema to identify othermodel objects that may be examined to see if they are related to othermodel objects. For example, if a model object is a patient model object,the ingestion engine may traverse the model schema and identify theencounter model objects may be related to patient objects. Accordingly,in this example, the ingestion engine may identify patient objects andencounter objects that should be related to each other. In someembodiments, if such objects are discovered, their relationship may berecorded or indicated as necessary.

At block 1112, in one or more of the various embodiments, optionally,the ML model repository may be arranged to selectively optimize modelobject storage or data model storage representation based on variouscharacteristics.

In one or more of the various embodiments, while data models maynormally be represented using graph models, storage may be modified orarranged to optimize performance or storage size. In one or more of thevarious embodiments, the ML model repository may be arranged to monitorvarious usage metrics, such as, frequency of use, number of modelobjects in query results, size of model objects, data type of modelobjects (e.g., text, video, images, voice audio, music audio, or thelike), age of data, or the like, or combination thereof.

Accordingly, In one or more of the various embodiments, the ML modelrepository may be arranged to execute one or more optimization policiesdriven by one or more optimization rules that may take into account themonitored metrics, data store characteristics, expenses, customerservice plans, or the like. In some embodiments, one or moreoptimization policies may operate automatically. In other embodiments,optimization policies may be manually applied or applied after customeror operator approval.

In one or more of the various embodiments, optimizations may includeproviding additional indices, storing some model object in databases,storing some model objects on fast data stores, or the like, orcombination thereof. This block is indicated as optional, because in oneor more of the various embodiments, ML model repositories may bearranged to omit or defer additional optimization operations. Next,control may be returned to a calling process.

FIG. 12 illustrates a flowchart for process 1200 for employing a machinelearning (ML) repository to answer questions in accordance with one ormore of the various embodiments. After a start block, at block 1202, inone or more of the various embodiments, one or more questions may beprovided to a ML model repository. In one or more of the variousembodiments, the questions may be in the form of one or more queryexpressions, or the like. In some embodiments, the question may beinferred, in the sense that one or more model objects may be provided tothe ML model repository for scoring or classification by one or moreselected ML models.

In one or more of the various embodiments, the ML model repository maybe arranged to provide query information that may correspond to theprovided questions. In some embodiments, the question may be a query orotherwise include the necessary query information for processing thequestion.

In one or more of the various embodiments, query information may beinferred from one or more ML models that the question may be directedtowards. For example, in some embodiments, if a ML model designed andtrained to classify or score patients that have two or more childrenunder 10 years old, the query information may be arranged to selectpatients that have or more children under the age of 10 based on the MLmodels.

At block 1206, in one or more of the various embodiments, optionally,the ML answer engine may be arranged to execute one or more optimizedqueries associated with the one or more ML models or one or modelobjects implicated by the query information. For example, if the queryinformation indicate that patient model objects associated withencounter model objects are the subject of the question, there may beone or more indices, query plans, caches, lookup tables, or the like,that have already been prepared in anticipation of fielding suchqueries. Accordingly, these optimized queries may be executed to selectone or more model objects that may be relevant to the current question.

This block is indicated as optional because, in one or more of thevarious embodiments, depending on various factors, such as, thequestion, query, ML model repository, per customer policies, optimizedqueries may be unavailable, or the like, or combination thereof.

At block 1208, in one or more of the various embodiments, the ML modelrepository may be arranged to traverse the data model or model schema toselect model objects to provide to one or more ML models for scoringthat satisfy the parameter models of one or more ML models or ML modelenvelopes. In one or more of the various embodiments, a ML answer enginemay be arranged to traverse the data model or model schema to selectmodel objects that conform to the query information. In one or more ofthe various embodiments, model objects so selected will conform to theparameter model defined by the ML models employed for answering theprovided question.

Likewise, in one or more model object have previously been selected orotherwise provided, the data model that includes the model objects maybe traversed to compare it with parameter models of one or more MLmodels or ML model envelopes. Thus, in some embodiments, model objectsthat satisfy parameter models of one or more ML models or ML modelenvelopes may be selected for scoring.

At block 1210, in one or more of the various embodiments, the ML modelrepository may be arranged to provide selected model objects to one tomore ML model answer engines, such as, ML model answer engine 329, orthe like.

In one or more of the various embodiments, the ML model answer enginemay be arranged to provide the data models that include the selectedmodel objects to the one or more ML models or ML model envelopesassociated with the question. In some embodiments, the ML model answerengine may be arranged to run one or more ML models included in one ormore ML model envelopes. In some embodiments, the ML model answer enginemay be arranged to execute one or more ML models in parallel orotherwise concurrently depending on the hosting environment, type ofmodel, data location, ML model affinity, per customer policies orcontracts, or the like.

In one or more of the various embodiments, some questions may require asingle ML model that may be executed in parallel to classify theprovided model objects. For example, multiple instances of a single MLmodel trained for predicting patient expense based on the provided modelobjects may be executed simultaneously on different hosts, such thateach instance scores a different portion or segments of the providedmodel objects.

In one or more of the various embodiments, as described above, some MLmodel envelopes may include two or more ML models. In some cases, theseincluded ML models may be arranged to execute in parallel as well.Accordingly, in one or more of the various embodiments, the ML modelanswer engine may split the execution of these ML models even thoughthey are included in the same ML model envelope.

In one or more of the various embodiments, ML model envelopes thatinclude more than one ML models may have one or more ML models arrangedto depend on the output of one or more other ML models included in thesame ML model envelope. For example, the selected model objects mayfirst be scored using a first ML model before being scored by other MLmodels in the ML model envelope. Accordingly, in one or more of thevarious embodiments, the ML model answer engine may be arranged toconsider these dependencies if selecting where or how to execute MLmodels. Thus, in one or more of the various embodiments, the ML modelanswer engine may intentionally assign two or more related or dependentML models to be scored on the same host, cloud compute instance, ornetwork computer. For example, in some embodiments, scoring twodependent ML models on the same host may avoid copying model objectsfrom one host to another. For example, if the a first ML model is usedto select a subset of model objects, if a dependent ML model isco-located the ML model repository performance overhead that may beassociated with copying the selected subset of model objects to anotherhost for additional scoring by other ML models may be eliminated.

In some embodiments, the ML model answer engine may be arranged analyzeML model envelopes or ML models to identify portions of segments of theML models that should be co-located rather the parallelized. Forexample, ML models may include sub ML models that may be grouped intosegments that may benefit from parallelization. In some embodiments, thedesign of a ML model may include hints or directives that indicate MLmodel dependencies or the lack thereof. In other embodiments, the MLmodel repository may be arranged to monitor one or more metrics that maybe used by the ML answer engine to determine how to allocate ML modelsfor execution.

In one or more of the various embodiments, the ML model repository maybe arranged to monitor one or more metrics, such as, model objectcopies, data transfers, local queries, or the like, that indicateunidentified ML model dependencies. For example, if one or more metricsindicate that model objects filtered or selected by one ML model areregularly provided to another ML model, the ML answer engine may bearranged automatically modify an affinity score for the two ML models toindicate that they should be considered for co-locating to avoid copyingmodel objects across the network.

Likewise, in one or more of the various embodiments, other metrics maybe employed to predict compute or memory resources required for one ormore ML models. Accordingly, in one or more of the various embodiments,ML models may be assigned for execution by hosts that may havecapabilities that align closer with the performance characteristics of agiven ML model. In some embodiments, assignment may be tempered by percustomer policies. For example, in some embodiments, if a ML modelanswer engine determines that a ML model would benefit from higherperforming hosts, a customer's individual contract or policy may requireexecution of the ML model on a lower performance tier. However, in someembodiments, the ML model repository may be arranged to provide reportsor notifications to customers indicating that their ML model performancemay benefit from a higher tier of service based on observed metrics.

At block 1212, in one or more of the various embodiments, the ML modelrepository may be arranged to provide one or more reports based on theML model answers. In one or more of the various embodiments, after thequestions provided to the ML model answer engine have been answered, theML model repository may provide a report that includes the model outputassociated with the ML models used to answer the question. In someembodiments, the report may be provided in various forms or formats,including, text files, spreadsheets, XML files, or the like. In someembodiments, the report may be formatted to conform to the requirementsof one or more reporting or visualization services, or the like. Next,control may be returned to a calling process.

FIG. 13 illustrates a flowchart for process 1300 for employing a machinelearning (ML) repository to answer questions in accordance with one ormore of the various embodiments. After a start block, at block 1302, inone or more of the various embodiments, the ML model repository may bearranged to provide a question and one or more ML model envelopes thatinclude one or more ML models to a ML model answer engine.

At block 1304, in one or more of the various embodiments, the ML modelanswer engine may be arranged to traverse a data model based on thequestion and the ML model parameter model of the one or more ML models.In some embodiments, if a question requires scoring patient modelobjects using a particular ML model, the ML model answer engine may bearranged to traverse the data model or model schema to build queryinformation necessary for complying with the input parameter model. Forexample, if the parameter model is satisfied by model objectsrepresenting patients that live in Seattle and have had an encounter(appointment), the model schema may be traversed to provide queryinformation to meet this requirement. In some cases, the model schema ordata model may have intervening objects (vertices or nodes) that mayneed to be traversed to identify the correct query information.

At block 1306, in one or more of the various embodiments, the ML modelrepository may be arranged to execute a query using the queryinformation to discover model objects eligible for scoring. As discussedabove, the ML model repository may be arranged to execute variousqueries to identify model objects for scoring.

At block 1308, in one or more of the various embodiments, the modelobjects and ML models may be provided to the ML model answer engine. Asdiscussed above the ML model answer engine may allocate or provide modelobjects and ML models to various hosts for execution to determine scoresin response to the question. Next, control may be returned to a callingprocess.

FIG. 14 illustrates a flowchart for process 1400 for employing a machinelearning (ML) repository to answer questions in accordance with one ormore of the various embodiments. After a start block, at block 1402, inone or more of the various embodiments, the ML model repository may bearranged to provide one or more model objects to a ML model envelope orML model. In one or more of the various embodiments, a ML model answerengine may be arranged to accept one or more model objects and one ormore ML models. In some embodiments, because one or more of the modelobjects may be incompatible with the ML model, the model objects may berepresented using a model object path corresponding to the model schemaor data model rather than the model objects themselves.

At block 1404, in one or more of the various embodiments, the ML modelrepository may be arranged to compare the model object traversal pathsin the data model or model schema with the parameter model. In one ormore of the various embodiments, the comparison may include comparingpaths in the model schema that may correspond to the ML model'sparameters. In one or more of the various embodiments, the comparisonmay check if the provided model objects include the model object fieldsor relationships to other model objects included in the parameter model.

At decision block 1406, in one or more of the various embodiments, ifthe model objects satisfy the ML model's parameter model requirements,control may flow to block 1408; otherwise, control may be returned to acalling process.

At block 1408, in one or more of the various embodiments, the ML modelanswer engine may be arranged to execute the ML model using the modelobjects that satisfy its parameter model. As discussed above, the MLmodel answer engine may be arranged to allocate ML model envelopes, MLmodels, model objects, or the like, to one or more hosts for execution.Next, control may be returned to a calling process.

It will be understood that each block of the flowchart illustrations,and combinations of blocks in the flowchart illustrations, can beimplemented by computer program instructions. These program instructionsmay be provided to a processor to produce a machine, such that theinstructions, which execute on the processor, create means forimplementing the actions specified in the flowchart block or blocks. Thecomputer program instructions may be executed by a processor to cause aseries of operational steps to be performed by the processor to producea computer-implemented process such that the instructions execute on theprocessor to provide steps for implementing the actions specified in theflowchart block or blocks. The computer program instructions may alsocause one or more of the operational steps shown in the blocks of theflowcharts to be performed in parallel. Moreover, some of the steps mayalso be performed across more than one processor, such as might arise ina multi-processor computer system. In addition, one or more blocks inthe flowchart illustration may also be performed concurrently withanother one or more blocks, or even in a different sequence thanillustrated without departing from the scope or spirit of the invention.

Additionally, one or more steps or blocks may be implemented usingembedded logic hardware, such as an Application Specific IntegratedCircuit (ASIC), Field Programmable Gate Array (FPGA), Programmable ArrayLogic (PAL), or the like, instead of a computer program. The embeddedlogic hardware may directly execute embedded logic to perform some orall of the actions in one or more steps or blocks. Also, in one or moreembodiments (not shown in the figures), some or all of the actions ofone or more of the steps or blocks may be performed by a hardwaremicrocontroller instead of a CPU. In one or more embodiments, themicrocontroller may directly execute its own embedded logic to performactions and access its own internal memory and its own external Inputand Output Interfaces (e.g., hardware pins or wireless transceivers) toperform actions, such as System On a Chip (SOC) or the like.

What is claimed as new and desired to be protected by Letters Patent ofthe United States is:
 1. A method for managing data over a network usingone or more processors, included in one or more network computers, toperform actions, comprising: instantiating an answer engine to performfurther actions, including: receiving one or more questions and one ormore model objects, wherein the one or more model objects are part of adata model that conforms to a model schema; receiving a plurality ofmachine learning (ML) model envelopes based on the one or morequestions; comparing the data model to parameter models that areassociated with each of the plurality of ML model envelopes, wherein thecomparison includes a traversal of the data model and one or more of theparameter models; selecting one or more of the plurality of ML modelenvelopes based on the comparison, wherein one or more traversal pathscorresponding to the one or more model objects satisfy the parametermodels of each of the selected one or more ML model envelopes; executingone or more ML models included in each selected ML model envelope toprovide score values for the one or more model objects, wherein thescore values are included in a report; and providing selectiveoptimization of one or more of performance or storage size for one ormore ML model repositories based on one or more characteristicsincluding one or more of model object usage frequency, number of modelobjects in a query result, model object size, or model object data type,wherein the one or more selective optimizations include one or more ofindices to improve identification of each model object to be omittedfrom the one or more traversal paths, or storing a portion of the one ormore model objects in a database or a fast data store.